Phone duration modeling using clustering of rich contexts

نویسندگان

  • Tanel Alumäe
  • Rena Nemoto
چکیده

This paper describes a phone duration model applied to speech recognition. The model is based on a decision tree that finds clusters of phones in various contexts that tend to have similar durations. Wide contexts with rich linguistic and phonetic features are used. To better model varying and non-stationary speaking rates, the contextual features also include the observed duration values of previous phones. For each resulting phone cluster, a log-normal distribution of duration is estimated. The resulting decision tree and the log-normal distributions are used to calculate likelihoods of phone durations in N-best lists. Experiments on two Estonian recognition tasks show a small but significant improvement in speech recognition accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of Duration Prediction Accuracy in HMM-Based Speech Synthesis

Appropriate phoneme durations are essential for high quality speech synthesis. In hidden Markov model-based text-tospeech (HMM-TTS), durations are typically modeled statistically using state duration probability distributions and duration prediction for unseen contexts. Use of rich context features enables synthesis without high-level linguistic knowledge. In this paper we analyze the accuracy ...

متن کامل

Duration prediction using multi-level model for GPR-based speech synthesis

This paper introduces frame-based Gaussian process regression (GPR) into phone/syllable duration modeling for Thai speech synthesis. The GPR model is designed for predicting framelevel acoustic features using corresponding frame information, which includes relative position in each unit of utterance structure and linguistic information such as tone type and part of speech. Although the GPR-base...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Context dependent phoneme duration modeling with tree-based state tying

In this paper, we propose phoneme duration modeling methods with tree-based state tying. Two kinds of phone duration modeling methods are suggested. The first is context independent phoneme duration model in which duration parameters are stored in each phone. The second is context dependent duration model in which duration parameters are stored in each state being shared by context dependent ph...

متن کامل

Analyse Power Consumption by Mobile Applications Using Fuzzy Clustering Approach

With the advancements in mobile technology and its utilization in every facet of life, mobile popularity has enhanced exponentially. The biggest constraint in the utility of mobile devices is that they are powered with batteries. Optimizing mobile’s size and weight is always the choice of designer, which led limited size and capacity of battery used in mobile phone. In this paper analysis of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013